Whose sample is it anyway ? Widespread misannotation of samples in transcriptomics studies
نویسنده
چکیده
Concern about the reproducibility and reliability of biomedical research has been rising. An understudied issue is the prevalence of sample mislabeling, one impact of which would be invalid comparisons. We studied this issue in a corpus of human transcriptomics studies by comparing the provided annotations of sex to the expression levels of sex-specific genes. We identified apparent mislabeled samples in 46% of the datasets studied, yielding a 99% confidence lower-bound estimate for all studies of 33%. In a separate analysis of a set of datasets concerning a single cohort of subjects, 2/4 had mislabeled samples, indicating laboratory mix-ups rather than data recording errors. While the number of mixed-up samples per study was generally small, because our method can only identify a subset of potential mix-ups, our estimate is conservative for the breadth of the problem. Our findings emphasize the need for more stringent sample tracking, and that re-users of published data must be alert to the possibility of annotation and labelling errors. This article is included in the Preclinical gateway. Reproducibility and Robustness This article is included in the gateway. INCF 1,2 1-3
منابع مشابه
Whose sample is it anyway? Widespread misannotation of samples in transcriptomics studies
Concern about the reproducibility and reliability of biomedical research has been rising. An understudied issue is the prevalence of sample mislabeling, one impact of which would be invalid comparisons. We studied this issue in a corpus of human transcriptomics studies by comparing the provided annotations of sex to the expression levels of sex-specific genes. We identified apparent mislabeled ...
متن کاملTLR2 and TLR4 Signaling Pathways and Gastric Cancer: Insights from Transcriptomics and Sample Validation
Background: Pattern recognition receptors, especially toll-like receptors (TLRs), as the first line of defense for pathogen detection, were found to be associated with H. pylori infection and gastric cancer (GC). However, the expression levels of TLRs, i.e. TLR2 and TLR4, as the main receptors sensed by H. pylori, still remain largely ambiguous. We aimed to investigate the patterns of key tra...
متن کاملتعیین حجم نمونه در تحقیقات پزشکی
One of the questions that most investigators specially physicians encounter when they plan a research project is determining sample sizes. Selecting a sample greater than what is required to obtain the desired results causes wastage of the sources, while selection of very small samples often leads the investigator to the results which have no practical use. Determining number ot the required sa...
متن کاملالگوی برنامه درسی پودمانی بر پایه استاندارد شایستگی حرفهای
In the world of education, the term "Unit" is a methodology that helps the structure of curriculum or the educational program in a long term course and separate specific parts meaning "Competency Unit". "unit" is not looked at as an independent teaching unit without any connection to other units. It is, actually, perceived as a process inseparable of a more widespread competency or a group of m...
متن کاملFrequency of Cytomegalovirus (CMV) in benign and malignant tumors
Cytomegalovirus (CMV) is a widespread pathogen that is found in milk, saliva, urine, cervical secretions, and semen fluid. It is supposed that CMV is a risk factor for breast cancer. The aim of this study was to determine the frequency of CMV among benign and malignant breast tumors. Paraffin embedded breast carcinoma (n=24) and fibroadenoma (n=24) samples were collected from Toos and Firoozgar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017